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Abstract: In this survey paper, we will present a number of core algorithmic questions 
concerning several transitive reduction problems on network that have applications in 
network synthesis and analysis involving cellular processes. Our starting point will be the 
so-called minimum equivalent digraph problem, a classic computational problem in 
combinatorial algorithms. We will subsequently consider a few non-trivial extensions or 
generalizations of this problem motivated by appHcations in systems biology. We will then 
discuss the applications of these algorithmic methodologies in the context of three major 
biological research questions: synthesizing and simplifying signal transduction networks, 
analyzing disease networks, and measuring redundancy of biological networks. 

Keywords: transitive reduction; minimum equivalent digraph; network synthesis; disease 
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1. Introduction 

In this survey paper, we review several transitive reduction problems on network that have 
applications in network synthesis and analysis involving cellular processes. Investigations of problems 
of these types that involve dealing with formal frameworks of very similar combinatorial natiare have been 
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done two by independent groups of communities of researchers, one being the theoretical computer 
science and computer networking research community and the other being the biological network 
research community. However, from the published literature it follows that there is minimal 
cooperation between such groups. The purpose of this survey is to promote a constructive dialogue 
among these two research communities working on similar problems so that intrigued biologists may 
probe further and learn new techniques from the perspective of formal analysis of algorithms and 
intrigued computer scientists may probe fiirther to learn new terminologies and applications in biology. 
Following the general guidelines of this special issue, we first present the formal algorithmic ideas 
separately from their application and subsequently discuss the applications that involve these formal 
frameworks. 

Minimum equivalent digraph is a classical computational problem (cf [1]) with several recent 
extensions motivated by applications in social sciences and systems biology. A formal definition of the 
basic equivalent digraph problem is as follows. 

Problem name: Minimum equivalent digraph (Min-Ed) 
Input: a directed graph (digraph) G = (V, E). 

E 

Definition: for a digraph (V, E) the fransitive closure of E is the relation ^ on V x V defined as 

E 

u. =>M^ = E contains a path from u. to Uj 

E A 

VaUd solution: A ^ e such that ^ is equal to ^ . 
Objective: minimize |A|. 

A complementary problem is the Max-Ed problem whose objective is to maximize |E\A|. Even 
though the complexity of finding an exact solution is the same for both Min-Ed and Max-Ed, the same 
may not necessarily be true for their approximate solutions (in the same manner as for node cover and 
independent set problems for general graphs [2]). For example, suppose that we have a graph with 
1,000 edges and an exact solution for MiN-ED and Max-Ed with 490 edges. Suppose that an 
approximation algorithm for Min-Ed guarantees that we will find a solution with at most 980 edges. 
Thus, this approximation algorithm provides an approximation ratio of 980/490 = 2 for MlN-ED. 
However, the same algorithm for Max-Ed can have an approximation ratio as large as 

1000-490 _ 510 

— — 25.5 d) 

1000-980 20 

Skipping the condition A ^ E in the definition of MlN-ED (or Max-Ed) yields the so-called 
transitive reduction (Tr) problem which was solved in polynomial-time by Aho, Garey and UUman [3]. 
See Figure 1 for an illustration of valid solutions of MlN-ED. 

1.1. Three Extensions of the Basic Version 

In this subsection, we discuss three non-trivial extensions of the basic problem that have been 
formulated based on their applications. We will review in more details the applications of the basic 
version as well as the other extensions separately in Section 4. 
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Figure 1. Illustrations of two valid solutions of Min-Ed on an input graph: (a) The original 
graph G = (V, E); (b, c) Two valid solutions (V, Ai) and (V, A2) of Min-Ed for G. 
The solution in (c) is optimal since it has fewer edges. 




{a){V,E) (b)(r,A) {c){V,A2) 

1.1.1. Min-Ed and Max-Ed with Critical Edges 

This extension is the same as Min-Ed or Max-Ed except that a given subset D of edges must be 
present in any valid solution. Formally, we are given D Q E as part of input and the condition "A Q E" 
is changed to D £ A Q E. Let us denote this version as critical-MiN-ED and critical-MAX-ED, 
as appropriate. As we will see subsequently, this extension is quite non-trivial if one desires a good 
approximate solution. 

1.1.2. Weighted Version of Min-Ed or Max-Ed 

In this version, each edge has a weight (positive real number) and an optimal valid solution must 
have the minimum possible value of total edge weights. Formally, we have a weight flinction 
w:E 9?^ and the goal is either to minimize SgeA w(e) or to maximize SgeE w(e) - SgeA w(e). Let us 
denote this version as weighted-MiN-ED or weighted-MAX-ED, as appropriate. Obviously, the basic 
version is a special case of this weighted version when every edge weight is 1 . 

1.1.3. Binary Transitive Reduction (Btr) 

This extension is a generalization of the basic version with critical edges and is described as 
follows [4—7]. We have an edge-labeling function £:E {-1, 1}. The label or parity of a path 
P = (uo, ui, Uk) is derived from the labels of its edges and given by l(P) = ni£(ui-i,Ui). 

HE) 

The transitive closure relation is now generalized as ^ = {(ui, uj, q):3 path P using edges in E 
from Ui to Uj and f (?) = q} . Then, A is a binary transitive reduction of E with a required subset D if 

((A) f(E) 

D ^ A ^ E and ^ = => . Obviously, the basic version with critical edges is a special case of BTR 
when every edge label is 1. There are two (maximization and minimization) objective flinctions 
corresponding to the two generalizations of the basic version Min-Ed and Max-Ed; they will be 

P,E 

denoted by Min-Btr and Max-Btr, respectively. We will use the notation Ui => uj to indicate a path 
from node Ui to node uj of parity p e {-1, 1 }. 

The relationships between various versions of the basic equivalent digraph problem are as follows: 

Min-Ed < Weighted-MiN-ED 
Max-Ed < Weighted-MAX-ED 
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Min-Ed < critical-MiN-ED < Min-Btr 

Max-Ed < critical-MAX-ED < Max-Btr 

where A <B means problem A is a special case of problem B. The relationships between the problem 
Weighted-MIN-ED and the problems critical-MIN-ED and MIN-Btr (and, similarly between the 
problem Weighted-MAX-ED and the problems critical-MAX-ED and Max-Btr) are not completely 
known, though it is possible to design approximation algorithms for critical-MIN-ED and Min-Btr 
based on approximation algorithms for Weighted-MlN-ED. 

We review the following standard definitions in approximation algorithms theory. A e-approximate 
solution (or simply a s-approximation) of a minimization (respectively, maximization) problem is a 
polynomial-time solution with an objective value no smaller than (respectively, no larger than) s times 
the value of the optimum; an algorithm of performance or approximation ratio s produces an 
s-approximate solution. A problem is APX-hard if there exists a s > 1 such that no polynomial-time 
algorithm has an approximation ratio of 8 unless P = NP. The notation OPT(G) (or simply OPT when 
G is clear from the context) will always denote the objective value of an optimal solution for the 
problem under consideration. We assume that the reader is familiar with the basic concepts of design 
and analysis of algorithms found in graduate level algorithms textbooks such as [2,8], and basic 
concepts of computational biology found in standard textbooks such as [9,10]. 

2. Summary of Known Algorithmic and Inapproximability Results 

In this section, we briefly review known algorithmic and inapproximability results for the various 
equivalent digraph and transitive reduction problems defined in the previous section, leaving a more 
detailed description of algorithmic techniques used to obtain these results in the next section. 

The algorithmic research work on Min-Ed was initiated by Moyles and Thomson [1] who described 
an efficient polynomial-time reduction of this problem for an arbitrary graph to that for a strongly 
connected graph, followed by an exact but exponential time algorithm for strongly connected graphs. 
Subsequently, an approximation algorithm for Min-Ed was detailed by KhuUer, Raghavachari and 



Young [11] with an approximation ratio of 



Tt 1 

+ S 

V 6 36 J 



» 1.617 + e (for any constant s > 0), which 



was improved to an approximation algorithm with an approximation ratio of )^ independently by 
Vetta [12] and by Berman, DasGupta and Karpinski [13]. Except [13], none of these approximation 
algorithms will generalize directly to critical-MlN-ED with the same approximation ratio. The 
only non-trivial approximation algorithm known for either MAX-Ed or critical-MAX-ED is a 
2-approximation algorithm described in [13]. 

For weighted-MlN-ED, Frederickson and JaJa [14] designed a 2-approximation algorithm using an 
algorithm for minimum cost rooted arborescence due to Edmonds [15] and Karp [16]. Basically, 
it suffices to find a minimum cost in- arborescence and out-arborescence in respect to an arbitrary root 
node v e V and take the union of all the edges in these two arborescences as the approximate solution. 

Albert et al. [4] showed how to convert any algorithm for Min-Ed with an approximation ratio p to 
an algorithm for critical-MlN-ED with an approximation ratio of 3-^ . They also provided a 

2-approximation for Min-Btr, but in fact, minor modification of their method and analysis as outlined 
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in [13] yields a % -approximation. Other heuristics for these problems were investigated in [5,6] but 
none of these heuristics guarantees a better approximation ratio. Table 1 shows a theoretical 
comparison of running times and approximation ratios of some of the known algorithms for the 
transitive reduction problems. Unfortunately, a systematic comparative empirical evaluation of these 
algorithmic approaches is not available in the published literature. However, implementations of 
several algorithmic approaches on an individual level are available. For example, Kachalo et al. [6] 
provided a software called NET-SYNTHESIS which used some of the algorithmic approaches 
described in Sections 3.2 and 3.4, and Milanovic et al. [17] discussed two meta-heuristic approaches to 
solve a more general version of the Min-Btr problem. 

On the inapproximability side, Papadimitriou [18] left it as an exercise to show that Min-Ed is 
NP-hard. Subsequently, KhuUer, Raghavachari and Young [11] provided a formal proof of both 
NP -hardness and APX-hardness of Min-Ed for arbitrary graphs. Motivated by their cycle contraction 
method in [11], they were interested in the complexity of the problem when there is an upper bound y 
on the length of any cycle in the input graph. In [18] the authors showed that Min-Ed can be solved in 
polynomial time if y = 3, Min-Ed is NP-hard if y = 5, and Min-Ed is APX-hard if y > 17. Reference [13] 
improved the APX-hardness result to show that both MlN-ED and Max-Ed are APX-hard even when 
y > 5. The exact complexity of both Min-Ed and Max-Ed when y = 4 is still unresolved. 



Table 1. Theoretical comparison of worst-case performance of some of the algorithms for 
the transitive reduction problems. 



Problem name 


Algorithmic approach 


Worst-case running time 
using straightforward 


Approximation 
ratio 






implementation 


Min-Ed 


KhuUer, Raghavachari and Young [11] 


0(n''^) 


1.617 + 


Min-Ed 


Vetta [12] 
Berman, DasGupta andKarpinski [13] 


0(n log n) 


K 


Max-Ed 


Berman, DasGupta and Karpinski [13] 


0(n log n) 


2 


critical-MlN-ED 


KhuUer, Raghavachari and Young [11] 




2.617 + 


critical-MlN-ED 


Berman, DasGupta andKarpinski [13] 


0(n log n) 


Yi 


critical-MlN-ED 


Frederickson and JaJa [14] 


0(n) 


2 


critical-MlN-ED 


Albert et al. [4] 


O(n^) 


% 


critical-MAX-ED 


Berman, DasGupta and Karpinski [13] 


0(n log n) 


2 


weighted-MlN-ED 


Frederickson and JaJa [14] 


0(n) 


2 


Min-Btr 


Albert et al. [4] 


0(n^) 


2 


Min-Btr 


Berman, DasGupta and Karpinski [13] 


0(n log n) 


K 


Max-Btr 


Berman, DasGupta and Karpinski [13] 


0(n log n) 


2 



3. Review of a Few Algoritbmic Techniques Used for Transitive Reduction Problems 

In this section, we review a few key algorithmic techniques that have been used in the literature to 
investigate algorithmic complexities of various versions of the transitive reduction problem. Our goal 
is not to provide every technical detail involving these methods, but rather to bring our salient features 
of these techniques in a way that may be understood by the practitioners as well. 
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3.1. From General Graphs to Strongly Connected Graphs 

Recall that a digraph (V, E) is strongly connected if and only if, for every pair of nodes Ui and uj, 

E E 

both the paths u^^u. and Uj^u^ exist. A reduction that was originally suggested in [1] and have 

been implicit in all subsequent works is the assumption that an s-approximation algorithm for 
critical-MlN-ED and critical-MAX-ED when the given graph is strongly connected also implies an 
8-approximation algorithm for the same problem on arbitrary digraphs. To understand why this is true, 
we first note that all these four problems can be solved easily in polynomial time using the following 
greedy approach if the input graph G = (V, E) is a directed acyclic graph (Dag) with D Q E as the set 
of required edges ((|) is the standard mathematical sjmibol of an empty set): 

Compute a topological ordering ui, U2, . . ., Un of the nodes of G (* thus, if (Ui, Uj) e E then i < j *) 
E' = E; A= 

for i = n, n - 1, n - 2, 1 do 

for j = n, n - 1, n - 2, ...,/ + 7 do 
if {u{, Uj) e E then 

if (ui, Uj) e D then add the edge {ui, Uj) to A 

else if the path u. => Uj does not exist then add the edge (ui, Uj) to A 
Return (V, A) as the solution 

It is easy to implement the above algorithm to run in polynomial time. Now, suppose that the input 
graph G is not a DAG and consider the strong component graph G' = (V, E') of G: 

V = {C|C is a strongly connected component of G} 

E' = {(C, C')|C.C' e V and (ui, uj) e E for some Ui e C and uj e C'} 

It is easy to see that G' is a DAG and can be found in 0(|V| + |E|) time [8]. Let A' be the solution 
of our problem on G'. Suppose that we have s-approximation algorithm for critical-MlN-Eo or 
critical-MAX-ED on each strongly connected component of G. Then, the union of the edges in this 
8-approximation for every strongly connected component of G together with the edges in A' provide 
an 8-approximation for the entire graph G. 

For Min-Btr or Max-Btr Albert et al. [4] provides a more complex reduction to show that an 
8-approximation algorithm for strongly connected graphs also implies an 8-approximation algorithm 
for arbitrary digraphs. To achieve this, each strongly connected component is replaced a graph with 
constantly many edges and nodes (called "gadget" in [4]) and then these graphs are connected 
appropriately such that the resulting graph is a Dag and an 8-approximation for the entire graph 
can be recovered using an exact optimal solution of the Dag and 8-approximations of the strongly 
connected components. 

Thus, for the remainder of this section, we assume without loss of generality that the input graph G 
is strongly connected. 
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3.2. The Cycle Contraction Method [1 1 J 

Consider an input graph G = (V, E) for the Min-Ed problem and suppose that G has a directed 
Hamiltonian cycle, i.e., a (directed) cycle that contains every node exactly once. Then clearly the edges 
in this cycle constitute an optimal solution of |V| edges. This intuition suggests a general strategy of 
repeatedly finding a longest cycle in the given graph, selecting the edges in this cycle and modifying 
the graph to reflect the selection of edges until we reach a valid solution. 

However, finding a directed Hamiltonian cycle or the longest cycle is in general NP-hard [2]. 
To circumvent the NP -hardness issue, KhuIIer, Raghavachari and Young in [11] designed the 
following "cycle contraction" approach. Contraction of an edge (vi, vj) is nothing but the act of 
merging the two nodes Vi and vj into a new single node Vij and deleting any resulting self-loops or 
multi-edges. Similarly, contraction of a cycle is defined as the contraction of every edge of the cycle; 
see Figure 2 for an illustration. Note that if c is a constant then one can easily check in polynomial time 
if a graph has a cycle of at least c edges. The algorithm, parameterized by a constant c > 3 to be chosen 
by the user, now proceeds as follows: 

forz = c, c- 1, ... ,4 do 

while (the graph contains a cycle of at least / edges) do 
Find a cycle C of at least / edges 
Select the edges in C and contract C 
endwhile 

endfor 

(* now the graph contains no cycle of more than 3 edges *) 

Solve Min-Ed on the reduced graph exactly using the algorithm in [19] and select the edges in 
this exact solution. 

Figure 2. Illustration of a cycle contraction: (a) shows the original graph and (b) shows the 
graph after the cycle ui, uj, wa, U4, us, ue, u\ has been contracted. 





(a) (b) 
It was shown in [11] that the above algorithm for MESf-ED returns a vahd solution containing y 



edges where y < 



TT 



1 



1 



6 36 c(c-l) 



OPT(G) « 



1.617 + 



1 

c(c"^ 



OPT(G) edges. 
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The above approach can also be apphed to critical-MlN-ED by simply adding all the edges 
from the required set of edges D to the solution. The number of edges z in the resulting solution 



of critical-MiN-ED satisfies z< 



_2 1 1 ^ r ^ \ 



, ^ I 1 

1+ + ■ 



OPT(G) « 



2.617 + 



V c{c-\)j 



OPT(G) 



y 6 36 c(c-l) J 

since obviously |D| < OPT. Another possibility outlined in [4] is to replace every required 
edge (ui, Uj) e D by introducing a new node Uij and adding two new edges (Ui, Uij) and (uij, uj), 
running the approximation algorithm for Min-Ed on this new graph, and then replacing the edges 
(ui, Uij) and (uij, uj) in the solution by the original edge (ui, uj). If an optimal solution of 
critical-MlN-ED on G uses P edges from E\D then this approach returns a solution (V,A) with 
^ \ 1 ^ 



|A| 



< 



1 + + ■ 

V 



(2|D| + y(?)-|D|« 2.236|D|+1.618;». 



6 36 c(c-l) 

3.3. The Arborescence Approach [14] 

A (rooted) spanning out-arborescence of a directed edge- weighted graph G = (V, E) is a directed 
acyclic spanning sub-graph (V, A) of G such that every node except one node (the roof) has exactly 
one incoming edge and the weight of such an out-arborescence is the sum of the weight of its edges. 
A spanning in-arborescence is defined analogously except that every node except the root has exactly 
one outgoing edge. An exact polynomial-time solution for computing a spanning in-arborescence 
or spanning out-arborescence of minimum weight was provided by the authors in [15,16,20]. An 
overview of this algorithm for computing a minimum weight out-arborescence (as formulated in [16]) 
is as follows. We first remove all incoming edges to the root v. Then we proceed as follows. First, we 
select for each node, except the root v, an incoming edge of minimum weight. If these edges do not 
give a spanning arborescence, then there must be a (directed) cycle C formed by a subset of these 
edges. Let w(C) = min {w(e)\e e C}. We contract the cycle C to a "mega"-node, and decrease the 

weight of every edge {u, v) from a node m ^ C to a node v e C by a - w(C), where a is the weight of 
the unique edge in C that is incoming to v. The process is then repeated on the reduced graph, and 
continued until we have a spanning arborescence on the remaining graph. The mega-nodes are then 
expanded in the reverse order. Each time a mega-node is expanded, exactly one of its edges that would 
produce two incoming edges to a node is discarded. A minimum weight in-arborescence can be 
computed by the same algorithm if we reverse the direction of all the edges of the input graph. 
See Figure 3 for an illustration. 

For weighted-MlN-ED, Frederickson and JaJa [14] proposed the following simple algorithm that 
gives a 2-approximation for an input graph G = (V, E): 

Select an arbitrary node v of G 

Find a minimum weight spanning in-arborescence (V, Ai) of G rooted at v 
Find a minimum weight spanning out-arborescence (V, A2) of G rooted at v 
Return (V, Ai u A2) as the solution 
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Figure 3. An illustration of the algorithm to compute a minimum weight spanning 
out-arborescence. The thick black edges at the final fourth step are the edges in the solution. 



root 




root 



root 




root 




Define the weight w(e) of an edge e e E as w{e) - 



The above solution is a valid solution since we can reach any node vj starting from any node Vi by 
taking a path from Vi to the root v followed by a path from v to the node vj. The solution is a 
2-approximation since any valid solution of weighted-MlN-ED includes both a minimum weight 
spanning in-arborescence and a minimum weight spanning out-arborescence and thus OPT(G) > max 
{|Ai|, IA2I}. A simple example of an input graph was also provided in [14] for which the above 
algorithm provides a solution to total weight 20PT(G). 

For critical-MlN-ED, a very similar approach as described below can be used to again provide a 
2-approximation for an input graph G = (V, E): 

[0, ifeeD 
[1, otherwise 

Select an arbitrary node Vr of G 

Find a minimum weight spanning in-arborescence T = (V, Ai) of G rooted at node Vr 

Q, if eeDu Aj 

1 , otherwise 

Find a minimum weight spanning out-arborescence T = (V, A2) of G rooted at node Vr 
Return (V, Ai u A2 u D) as the solution 

Albert et al. [4] showed how to modify the above algorithm and combine it with any 
p-approximation algorithm for Min-Ed to obtain an improved algorithm for critical-MlN-ED with an 
approximation ratio of 3 - . Currently, the best possible value of p is 1 .5 which leads to a 

% -approximation for critical-MlN-ED using this approach. 



Redefine the weight w(e) of an edge e e E as w{e) - 
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3.4. From Critical-MlN-ED And Critical-MAX-ED To MlN-EoAnd Max-Ed [4,13] 

The results in [4,13] show how to transform a solution to critical-MlN-ED (respectively, 
critical-MAX-Eo) to a solution to Min-Ed (respectively, Max-Ed) by adding a single edge (We remind 
the reader that we assume that the input graph is strongly connected.) that can be found in polynomial 
time. The idea behind this is as follows. We can distinguish our input (and strongly connected) graph 

G based on whether G = (V, E) has a cycle of parity -1 (double parity graph) or not (single parity 
graph). Whether G is a single or double parity graph can be easily checked in 0(|Vp) time by using a 
simple modification of the well-known Floyd- Warshall transitive closure algorithm [8] as outlined 
in [4]. Now we can observe the following: 

• If G is a single parity graph then for every pair of nodes Ui, uj e V, exactly one of the two the 

\,E -IE 

paths Ui => Uj and Ui ^ uj exists. Then, we can simply ignore the edge labels and compute a 
solution (V, A) of critical-MiN-ED (respectively, critical-MAX-Eo) on G. It can be seen that 
(V, A) also provides a valid solution for Min-Ed (respectively, Max-Ed). 

• Otherwise, G is a double parity graph. We again first ignore the edge labels and compute a 
solution (V, A) of critical-MiN-ED (respectively, critical-MAX-ED) on G. Note that (V, A) 
contains a rooted arborescence, say (V, Ai) with Ai Q A, rooted at some node Ur. We label each 
node Ui e V with €(ui) = €(Pi) where Pi is the unique path in (V, Ai) from Ur to Ui. Since G is a 

double parity graph, there must exist an edge (ui, uj) e E such that /(ui) €(uj) ^ uj), and 

adding this edge (if not already present) to A produces a valid solution of critical-MiN-ED or 
critical-MAX-ED for G. 

3.5. Linear Programming Based Approach [13] 

We refer the reader to a standard graduate level textbook such as [21] for basic concepts and 
definitions related to linear programming and its applications to designing approximation algorithms. 

An exponential-size linear programming (LP) formulation for the minimum weight rooted 
(at node u^ out-arborescence problem for an edge-weighted input graph G = (V, E) was provided by 
Edmonds [15] in the following manner. We use a binary indicator variable = a:„ „ for every edge 

e = (Mi, Uj) e E which describes whether we select e (xe = 1) or do not select e (xg = 0) in our solution. 
For U cz V, define i(U) = {(ui, Uj) e E:ui i U and Uj e U}. Then, the LP formulation is: 

minimize ^w(e)x^ 

subject to 

^ x^>lforallUsuchthat<DcUcVandM^^U 

est(U) 

x^>0 for all e e E 

Edmonds [15] showed that the above LP always has an integral optimal solution (i.e., an optimal 
solution with Xe e {0, 1 } for all e e E) which provides an optimal solution for our minimum weight 
rooted out-arborescence problem. Note that the above LP has 0(2'^') constraints in the worst case. 
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However, the advantage of such a hnear programming is that we can now make use of powerful 
mathematical tools, such as the duality theorem, trom the theory of linear programming. 

We can modify the above LP formulation to a primal LP formulation Pi for Min-Ed provided we set 
w(e) = 1 for all e e E and we remove "and Ur € U" from the condition in constraint (1). The dual 

program Di of this L? can be constructed by having a variable y\j for every O c U c V. Both the 
primal and the dual L? are written down below for clarity. 



(primal L? Pi) 


(dual LP Di) 


minimize 

eeE 

subject to 

2^ x^>lforallUsuchthat<DcUcV 

esi(U) 

> 0 for all e G E 


maximize ^ y^j 

«<=u<=v 

subject to 

^ - 1 for every edge eeE 

Ju^OforallOcUcV 



We can change Pi into a L? formulation for Max-Ed if we replace the objective "minimize 
Zee E Xe" by "maximizc |E| - Zee e Xe \ and change the dual Di accordingly to reflect this change. We 
can flirther change this formulation for Min-Ed and Max-Ed to critical-MlN-ED and critical-MAX-ED, 
respectively, by adding a constraint Xe > 1 for every edge e e D. 

Note that Pi does not provide a valid solution of the Min-Ed problem unless the constraint Xe>0 for 
every edge e e E is replaced by the constraint e {0, 1}, resulting in an integer linear program (ILP) 
whose exact solution is in general NP-hard to compute. We will denote this Ilp corresponding to Pi by IPi. 

3.5.1. Applying LP-Based Approach to Critical-MlN-ED 

We provide a high-level overview of the primal-dual approach used in [13] for critical-MiN-ED on 
an input graph G = (V, E). 

1 . We start with an initial assignment of values to variables in IPi in the following manner. We 
keep only a subset of constraints of IPi such that the resulting Ilp can be solved exactly in 
polynomial time, giving an optimal solution Ai Q E. Then, it follows that OPT(G) > |Ai|. 

2. However, (V, Ai) may not be a valid solution for critical-MlN-ED on G (i.e., IPi). Then, we try 
to make Ai a valid solution by adding and/or removing edges so that we use a total of at most 
1 77-1 edges where OPT(G) > r] > |Ai|, giving a )^ -approximation for critical-MlN-ED. The 

edge alteration procedure was carried out in [13] using the DPS (depth- first-search) algorithm 
as originally outlined in a seminal paper by Tarjan (e.g., see the textbook [22]). 

The initial solution Ai referred to above in Step 1 is obtained in the following manner. For U (= V, 
define o(U) = {(ui, u,) e E:ui e U and uj ^ U}. Call a constraint of type Zee i(u) Xe > lin IPi "tractable" 
if for some node Uj either i(U) ^ i({ui}) or i(U) ^ o({ui}). It was shown in [13] that the set of tractable 
constraints of IPi can be found easily and the resulting Ilp can be solved exactly using any algorithm 
that finds a maximum matching in a bipartite graph. Figure 4 shows an example of the initial solution 
Ai found by this approach. 
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Figure 4, An illustration of the initial solution Ai discussed in the algorithm that applies a 
LP-based approach to critical-MlN-ED in Section 3.5.1: (a) The input graph G; (b) The 
edges in the initial solution Ai. As one can see, the initial solution does not provide a valid 
solution of critical-MIN-ED since the graph in Figure 4b is not strongly connected, but the 
final solution is obtained by adding and deleting edges from this initial solution. 




(a) (b) 

The DFS-based edge addition/removal method referred to in Step 2 is highly technical with 
elaborate case analysis and is beyond the scope of this review paper. In a nutshell, difficulties may 
arise because in some cases the algorithm may be forced to use more than ||Aj| — 1 edges. Then, we 

look at the "non-tractable" constraints of the primal Pi or dual Di to get an improved lower-bound t] 
for OPT(G) {i.e., OPT(G) > t] > Ai) to ensure that we use at most \ri -\ edges. In the proof we need 

to crucially use the weak-duality theorem of linear programming which states that if OPT(Pi) and OPT(Di) 
are the objective values of an optimal solution of Pi and Di, respectively, then OPT(Pi) > OPT(Di). 

3.5.2. Applying LP-Based Approach to Critical-MAX-ED 

We provide an overview of the 2-approximation algorithm for critical-MAX-ED on an input graph 
G = (V, E) using a LP-based approach as described in [13]. CaU an edge e e E a necessary edge if 

either e e D or i(U) ~ {e} for some U c V and let F be the set of necessary edges. If the edges in F 
provide a valid solution of critical-MAX-ED on G then (V, F) provide us with an optimal solution, thus 
assume that this is not the case below. In this case, Zeei(u) = 0 for some O c U c V, so there must be a 

node Ur such that no edges in F enter Ur. As a pre-processing step, we repeatedly contract a cycle of 
necessary edges until no such cycles remain. Let OPTin-arb(G) be the total weight of a minimum-weight 
in-arborescence of G rooted at u^. Consider the LP formulation for the minimum weight rooted 
out-arborescence problem as defined before: 



Biology 2014, 3 



13 



minimize ^ w(e)x^ 
subject to 

^ > 1 for all U such that O c U c V and u^^U 
> 0 for all e e E 

1 ' otherwise ' suppose that we set = | y ' Qj^^g^jse • This assignment of 

variables is a valid solution of the above LP. 

Now, compute a minimum weight out-arborescence Tout = (V,Aout) rooted at Ur. If there are z + 1 
edges in E that are not in Aout, then OPT(G) < z. Suppose now that we change w{e) for every e e Aout to 

zero and keep the other weights unchanged. Our previous fractional solution, namely 
f 1 *f F 

~ 1/ ' otherwise ' ^ ^^^^^ ^ valid solution of the LP, and thus the total value of the objective function 

of this fractional solution is at most ^ , which together with the result of Edmonds [15] that showed 
that "the LP always has an integral optimal solution" implies that OPTin-arb(G) < ^ , which implies 
that we delete at least z + 1 - ^ = ^ edges from the in-arborescence and take the remaining edges of 

the in-arborescence together with all the edges in Aout to get a valid solution of critical-MAX-ED on G. 
The total number of edges we have deleted in at least ^ > °^^^'>^^ . A slight modification in the 

argument shows that in fact we can delete at least edges. 

3.5.3. Limitations of Lp-Based Approaches 

A standard way of understanding the limitations of any Lp-based approach for designing 
approximation algorithms is to measure the integrality gap, i.e., the ratio of the objective value of an 
optimal integral solution to that of an optimal fractional solution for a minimization problem and the 
ratio of the objective value of an optimal fractional solution to that of an optimal integral solution for a 
minimization problem [21]. In [13] it was shown that the integrality gap for Pi was at least Y^hy 

giving an explicit construction of an input graph for which this ratio is achieved. The same input graph 
also shows that the integrality gap for the modification of Pi corresponding to Max-Ed is at least % . 

4. Biological Applications 

In this section, we discuss three applications of transitive reduction problems in computational 
biology and bioinformatics. For other non-biology applications of transitive reduction problems, such 
as in visualization of Enron email networks or in connectivity issues of computer networks, the reader 
may consult appropriate references such as [1 1,23]. 

We briefly review the standard regulatory network model that was mentioned in Section 1.1.3 in 
connection with the Min-Btr and Max-Btr problems. A regulatory network is described by an 
edge-labelled directed graph G = (V, E) in which nodes represent individual components of the 
biological system and (directed) edges of the form {u{, mj) indicates that node Ui has an influence on 
node Uy The edge labelling fimction £:E — > {-1, 1} indicates the nature of the causal relationship, with 
l{u, Mj) = 1 and l{u{, mj) = -1 indicating that Ui has an excitatory (positive) and inhibitory (negative) 
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influence on mj, respectively; pictorially, it is quite common to denote an excitory and an inhibitory 
edge by and — I , respectively. This representation applies to both gene regulatory networks 
(describing the regulation of gene transcription and related processes) and signal transduction networks 
(describing the information flow from external signals to within-cell components). Some examples of 
large size biological networks include: 

• Mammalian network of signaling pathways and cellular machines in the hippocampal CAl 
neuron having 512 nodes and 1,047 edges [24]. 

• S. cerevisiae transcriptional regulatory network of interactions between transcription factor 
proteins and genes having 690 nodes and 1,082 edges [25]. 

• C. elegans metabolic network having 651 nodes and 2,040 edges [26]. 

• Oriented version of an unweighted PPI network constructed from S. cerevisiae interactions in 
the BioGRID database having 786 nodes and 2,453 edges [27]. 

Existence of such large networks rules out exact brute-force calculations of optimal solutions 
of transitive reduction problems and provides motivations to explore approximation algorithms for 
these problems. 

4.1. Network Construction and Simplification from Direct and Double-Causal Data 

Signal transduction and gene regulatory networks are crucial to the maintenance of cellular 
homeostasis and for cell behavior such as growth, survival, apoptosis, and movement. Deregulation of 
these networks is a key contributor to many disease processes such as developmental disorders, 
diabetes, vascular diseases, and cancer. In a signal transduction network (pathway), there is typically 
an input, perceived by a receptor, followed by a series of elements through which the signal percolates 
to the output node, which represents the final outcome of the signal transduction process. For a cellular 
signal transduction pathway not involving alterations in gene expression, elements often consist of 
proteinaceous receptors, intermediary signaling proteins and metabolites, effector proteins, and a final 
output, which represents the ultimate combined effect of the effector proteins. If the signal 
transduction process includes regulation of the transcript level of a particular gene, the intermediate 
signaling elements will also include the gene itself and the transcription factors that regulate it, as well 
any small RNAs that regulate the transcript's abundance, with the final output being presence or 
absence of transcripts. Genome-wide experimental methods now identify interactions among 
thousands of proteins [28-34]. However, the state of the art understanding of many signaling processes 
is often limited to the knowledge of key mediators and of their positive or negative effects on the 
whole process. The experimental evidence about the involvement of specific components in a given 
signal transduction network frequently belongs to one of these two categories: 

(i) "Direct" interactions corresponding to biochemical evidences that provide information on 

enzymatic activity or protein-protein interactions and represent direct physical interactions. 
An interaction of this type is of the form "A promotes B" or "A inhibits B", and is represented 
in the usual manner by a directed edge A — > B and A — | B, respectively. Edges corresponding 
to known (documented) direct interactions are marked as "critical" and belong to the set D of 
required edges. 
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(ii) "Putative" interaction patterns that arise, for example, during differential responses to a 
stimulus, which in a wild-type organism versus a mutant organism implicates the product of the 
mutated gene in the signal transduction process. This type of interaction pattern is not a direct 
interaction but rather corresponds to an indirect {double-causal) relationship most likely 
resulting from a chain of direct interactions and reactions, and is a 3 -component inference 
represented by a small-size sub-graph among three or four nodes. 

As noted above, inference of type (ii) may not give direct interactions but indirect causal 
relationships that correspond to reachability relationships in the unknown interaction network for 
which the Min-Btr and Max-Btr problems become directly applicable. More precisely, inferences of 
type (ii) typically lead to double-causal inferences of the type "C promotes the process through which 
A promotes B", and may correspond to an intersection of two paths (one path from A to B and another 
path from C to B) in the interaction network {i.e., C is assumed to activate an unknown intermediary 
node of the A to B path). 

The research works in [5-7] led to the development of an efficient and accurate method 
incorporating all relevant biological knowledge for synthesizing path-level information into a 
consistent network by constructing a minimal graph that maintains all reachability relationships 
without requiring expression information (unlike, say, many reverse-engineering approaches). 
Methods prior to [5-7] for synthesizing signal transduction networks, such as [28], only included 
direct biochemical interactions and were therefore restricted by the incompleteness of the experimental 
knowledge on pairwise interactions. Key steps in the network synthesis method developed in [5-7] are 
schematically shown in Figure 5. The first step is a distillation of experimental conclusions into 
qualitative regulatory relations between cellular components (This is a complex process by itself It is 
important to note that human intervention will inevitably be an important component of the literature 
curation process even though automated text search engines such as GENIES [32] become more and 
more popular). Direct biochemical and pharmacological evidences, such as "A promotes B" are 
incorporated as a directed edge (A, B). Other kind of double-causal evidences (such as genetic 
evidences of differential responses to a stimulus) are handled in the third step in the schematic 
diagram. For the sake of concreteness, assume that such a double -causal interaction is of the form 
"C promotes the process through which A promotes B". The only way such a double-causal interaction 
may correspond to a direct interaction is if C is an enzjmie catalyzing a reaction in which A is 
transformed into B, and for this case the interaction can be represented as both A (the substrate) and C 
(the enzyme) activating B (the product), i.e., by two edges A — > B and C — > B. If the interaction 
between A and B is direct and C is not a catalyst of the interaction between A and B, we can assume 
that C activates A. In all other cases, this type of interaction corresponds to an intersection of two paths 
(A to B and C to B) in the interaction network by introducing new nodes (called "pseudo -nodes" in [5] 
and elsewhere since they are added only to satisfy the pathway properties). One important algorithmic 
idea in this network synthesis method is that of finding a minimal (Intuitively, by computing a minimal 
graph we want to be as close as possible to a "tree-like topology" while supporting all experimental 
observations. Implicit assumption of chain-like or tree-like topologies permeates the traditional 
molecular biology literature, e.g., signal transduction and metabolic pathways are assumed to be close 
to linear chains and genes are assumed to be regulated by one or two transcription factors [33].) 
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network, in terms of number of non-critical edges {i.e., edges not in D), that is consistent with all 
(directed) reachability relationships between nodes, and is captured by the Min-Btr and Max-Btr 
problem discussed earlier. For further details, see [5-7]. A software named NET-SYNTHESIS 
incorporating the method shown in Figure 5 using some of the algorithmic ideas described for 
Min-Btr and Max-Btr in Section 3 was first reported in [5,6] and is Ireely available for download. 
The input to NET-SYNTHESIS is a list of relationships among biological components (direct and 
double causal), and its output is a network diagram and a text file with the edges of the signal 
transduction network. 

Figure 5. A schematic diagram of the network synthesis method in [5-7]. Human 
interaction is necessary since some choices may have to be made in distilling the 
component relationships, e.g., when there are conflicting reports in the literature. 



4.1.1. Applications in Agronomic Research 

Guard cells are central components in control of plant water status [34] and better understanding of 
their regulation is imperative for the goal of engineering of crops with improved drought tolerance. 
Plants both lose water and take in carbon dioxide through microscopic stomatal pores, each of which is 
regulated by a surrounding pair of guard cells. During drought, the plant hormone abscisic acid (ABA) 
inhibits stomatal opening and promotes stomatal closure, thereby promoting water conservation. ABA 
signal transduction in guard cells is one of the best characterized signaling systems in plants with many 
signal transduction proteins, secondary metabolites and ion channels having been identified to 
participate in the process [35-37]. 

The research works in [5,6] used the NET-SYNTHESIS software to generate a network for 
ABA- induced closure from is a list of about 140 interactions and causal inferences for ABA-induced 
closure published in Table SI and Text SI in [38]. A detailed comparison of this computer generated 
network with a manually curated network for ABA-induced closure published in [38] validated the 
accuracy of the algorithms for Min-Btr used in the software. 

4.2. Analyzing Disease Networks (Biomedical Application) 

Large Granular Lymphocytes (LGL) are medium to large size cells with eccentric nuclei and 
abundant cytoplasm. In normal adults, LGL comprise 10%~15% of the total peripheral blood 
mononuclear cells. The disease LGL leukemia is a disordered clonal expansion of LGL and their 
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invasions in the marrow, spleen and liver. Ras is a small GTPase, which is essential for controlling 
multiple essential signaling pathways, and its deregulation is Irequently seen in human cancers. 
Activation of H-Ras required its famesylation, which can be blocked by farnesyltransferase inhibitiors 
(FTIs). This envisions FTIs as future drug target for anti-cancer therapies. One of these FTI is 
tipifarnib which shows apoptosis induction effect to leukemic LGL in vitro. This observation, together 
with the finding that Ras is constitutively activated in leukemic LGL cells, leads to the hypothesis that 
Ras plays an important role in LGL leukemia, and may function through influencing Fas/FasL pathway. 

Kachalo et al. in [6] used the NET-SYNTHESIS software together with its specific transitive 
reduction algorithms to synthesize a cell-survival/cell-death regulation related signaling network from 
the Transpath 6.0 database with additional information manually curated from literature search, having 
359 nodes representing proteins/protein families and mRNAs participating in pro-survival and 
Fas-induced apoptosis pathways and 1,295 edges representing regulatory relationships between nodes, 
including protein interactions, catalytic reactions, transcriptional regulation and known double-causal 
regulations. Using Min-Btr and other algorithms, they were able to reduce the size of the original 
network to 267 nodes and 751 edges to focus special interest on the effect of Ras on apoptosis 
response through Fas/FasL pathway that involve the 33 known T-LGL deregulated proteins. Further 
work in this direction was done by Zhang et al. in [39] in building and analyzing a network 
model of signaling components of survival of cytoxic T lymphocytes in LGL-leukemia using the 
NET-SYNTHESIS software. 

For further applications of transitive reduction problems to drug target identification, see [40]. 

4.3. Measuring Topological Redundancy of Biological Networks 

The concept of redundancy is well known in information theory. Informally, redundancy refers to 
identical elements performing the same flinction (There are also other definitions of the redundancy 
concept in the context of other biological applications that is completely different from ours. For 
example, in some context redundancy refers to paralogous genes that provide fiinctional backup for 
one another [41]). In computer networks and electronic systems, such measures are useful in analyzing 
properties such as fault-tolerance. It is an accepted fact that biological networks do not necessarily 
have the lowest possible degeneracy or redundancy. For example, the connectivity of neurons in brains 
suggests a high degree of degeneracy [42]. As Tononi, Spoms and Edehnan observed in [43], a 
specific and useful notion of redundancy has yet to be firmly incorporated into biological thinking, 
often because of the lack of a suitable formal theoretical framework. A fijrther reason for the lack of 
incorporation of these notions in biological thinking is the lack of computationally efficient procedures 
for computing these measures for large-scale networks even when formal definitions are available. 
Therefore, such studies are often done in a somewhat ad-hoc fashion, such as in [44]. There are 
notions of redundancy available in the field of analysis of undirected graphs based on clustering 
coefficients [45] or betweenness centrality measures [46]. However, such notions are not appropriate 
for the analysis of biological networks where we must distinguish positive from negative regulatory 
interactions or where we wish to study possible relationships of the dynamics of the network with 
its redundancy. 
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Based on the Min-Btr and Max-Btr problems, Albert et al. in [47] proposed a new combinatorial 
measure of redundancy that is amenable to efficient algorithmic analysis. Note that binary transitive 
reduction of a graph (V, E) does not change pathway level information of the network and removes an 

edge from one node Ui — > uj or Ui — I uj only when a similar alternate pathway, namely Ui => Uj or 

Ui ^ Uj respectively, exists, thus truly removing redundant connections. Thus, if (V, Ei) is an 
optimal solution of Min-Btr and Max-Btr on the input graph G = (V, E) then ||j provides a measure 

of global compressibility of the network. Based on this intuition, Albert et al. in [47] proposed a new 

I E I 

redundancy measure R=1-|y[ > where the |E| term in the denominator is simply a "min-max 

normalization" of the measure to ensure that 0 < R < 1 . Note that the higher the value of R is, the more 
redundant the network is. Since Min-Btr or Max-Btr can be computed efficiently, Albert et al. were 
able to evaluate R on a variety of large biological and directed social networks to derive interesting 
conclusions such as transcriptional networks are less redundant than signaling networks, directed social 
networks are more redundant than biological networks, the topological redundancy of the C. elegans 
metabolic network is largely due to its inclusion of currency metabolites and the redundancy of 
signaling networks is highly {negatively) correlated with the monotonicity of their dynamics. 

5. Conclusions 

In this review paper, we have elaborated on a few graph-theoretic problems that involve finding an 
"equivalent" sparser graph, explain several key mathematical and algorithmic tools that may be used to 
design efficient computational methods to solve these problems and then provided details of three 
biological applications of these problems. The idea of transitive reductions, in a more simplistic setting 
or in a different form, has also been used to identify structure of gene regulatory networks [48-52]. 
Of particular interest is a network "deconvolution" problem, considered by Feizi et al. [52], that is in 
some sense an inverse of the transitive reduction problems studied in this paper: their goal was to infer 
the original network given a set of direct (edge-level) and indirect (pathway-level) information about 
the graph. The authors in this paper showed that an exact closed-form solution of this problem can be 
found using an infinite-series summation. We hope that our review will lead to fiirther interests in 
transitive reduction type problems and will promote further collaboration between the computational 
biology and the graph algorithms community. 
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